A simple data discretizer
نویسندگان
چکیده
Data discretization is an important step in the process of machine learning, since it is easier for classifiers to deal with discrete attributes rather than continuous attributes. Over the years, several methods of performing discretization such as Boolean Reasoning, Equal Frequency Binning, Entropy have been proposed, explored, and implemented. In this article, a simple supervised discretization approach is introduced. The prime goal of MIL is to maximize classification accuracy of classifier, minimizing loss of information while discretization of continuous attributes. The performance of the suggested approach is compared with the supervised discretization algorithm Minimum Information Loss (MIL), using the state-of-the-art rule inductive algorithmsJ48 (Java implementation of C4.5 classifier). The presented approach is, indeed, the modified version of MIL. The empirical results show that the modified approach performs better in several cases in comparison to the original MIL algorithm and Minimum Description Length Principle (MDLP) .
منابع مشابه
A Novel Discretizer for Knowledge Discovery Approaches Based on Rough Sets
Knowledge discovery approaches based on rough sets have successful application in machine learning and data mining. As these approaches are good at dealing with discrete values, a discretizer is required when the approaches are applied to continuous attributes. In this paper, a novel adaptive discretizer based on a statistical distribution index is proposed to preprocess continuous valued attri...
متن کاملIncremental Discretization and Bayes Classifiers Handles Concept Drift and Scales Very Well
Many data sets exhibit an early plateau where the performance of a learner peaks after seeing a few hundred (or less) instances. When concepts drift slower than the time to find that plateau, then a simple windowing policy and an incremental discretizer lets standard learners like Naı̈veBayes classifiers scale to very large data sets. Our toolkit is simple to implement, can scale to millions of ...
متن کاملA Bayesian Discretizer for Real-Valued Attributes
Discretization of real-valued attributes into nominal intervals has been an important area for symbolic induction systems because many real world classiication tasks involve both symbolic and numerical attributes. Among various supervised and unsupervised discretization methods, the information gain based methods have been widely used and cited. This paper designs a new discretization method, c...
متن کاملData discretization: taxonomy and big data challenge
Discretization of numerical data is one of the most influential data preprocessing tasks in knowledge discovery and data mining. The purpose of attribute discretization is to find concise data representations as categories which are adequate for the learning task retaining as much information in the original continuous attribute as possible. In this article, we present an updated overview of di...
متن کاملGeometry and surface - assisted micro flow discretization ∗
This paper presents a micro flow discretization system that autonomously digitizes continuous liquid flow into nanoliter segments. Powered by the interactions between liquid flow and micro-channels, the discretization process does not consume any electricity or require any external control. In the prototype demonstration, the discretizer is made of PDMS microfluidic channels with desired geomet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.05091 شماره
صفحات -
تاریخ انتشار 2017